Goto

Collaborating Authors

 Donna


Paired by the Teacher: Turning Unpaired Data into High-Fidelity Pairs for Low-Resource Text Generation

Lu, Yen-Ju, Thebaud, Thomas, Moro-Velazquez, Laureano, Dehak, Najim, Villalba, Jesus

arXiv.org Artificial Intelligence

We present Paired by the Teacher (PbT), a two-stage teacher-student pipeline that synthesizes accurate input-output pairs without human labels or parallel data. In many low-resource natural language generation (NLG) scenarios, practitioners may have only raw outputs, like highlights, recaps, or questions, or only raw inputs, such as articles, dialogues, or paragraphs, but seldom both. This mismatch forces small models to learn from very few examples or rely on costly, broad-scope synthetic examples produced by large LLMs. PbT addresses this by asking a teacher LLM to compress each unpaired example into a concise intermediate representation (IR), and training a student to reconstruct inputs from IRs. This enables outputs to be paired with student-generated inputs, yielding high-quality synthetic data. We evaluate PbT on five benchmarks-document summarization (XSum, CNNDM), dialogue summarization (SAMSum, DialogSum), and question generation (SQuAD)-as well as an unpaired setting on SwitchBoard (paired with DialogSum summaries). An 8B student trained only on PbT data outperforms models trained on 70 B teacher-generated corpora and other unsupervised baselines, coming within 1.2 ROUGE-L of human-annotated pairs and closing 82% of the oracle gap at one-third the annotation cost of direct synthesis. Human evaluation on SwitchBoard further confirms that only PbT produces concise, faithful summaries aligned with the target style, highlighting its advantage of generating in-domain sources that avoid the mismatch, limiting direct synthesis.


Nancy Mace sees AI as a chance to improve border security: 'A lot of opportunity'

FOX News

GOP Rep. Nancy Mace spoke exclusively with Fox News Digital about her thoughts on the rapidly advancing AI sector as Congress races to get ahead of the burgeoning technology. EXCLUSIVE: Rep. Nancy Mace, R-S.C., is calling on the federal government to use artificial intelligence technology to better secure the southwestern border. During an interview with Fox News Digital, Mace suggested the rapidly advancing technology could be used to enhance border patrol agents' monitoring capabilities as border officials continue to see a record number of illegal aliens attempting to cross into the U.S. through Mexico. On one front, she said, AI could help better collect "biometrics of everyone that comes across the border, especially when we're talking about by land and illegally. Rep. Nancy Mace spoke with Fox News Digital about how AI technology can be used to improve border security. "And if you're using AI to find their biometrics in a database or multiple databases, I believe it can be done in a much swifter fashion," the congresswoman explained. "I think that that kind of technology could be used when you're driving through the border.